The goal of this study is to examine the impact of certain variables on the climate by examining the AQI of counties across the United States of America using data collected by the EPA.
There are two smaller sub studies in this presentation: One examining the effects of the Climate Alliance legislative program, and another examining the correlation between aspects of counties and the air quality.
To begin we read the data in from the EPA datasets.
## `summarise()` has grouped output by 'state'. You can override using the `.groups` argument.
## [1] 85.3
The 6 most dangerous pollutants are ozone, nitrogen dioxide, sulfur dioxide, lead, carbon monoxide, and particulate matter.
#The plots show that the concentrations have gradually decreased over time for this pollutant or in a few cases have remained the same. The places where the data varies significantly are the West, Southwest, and the Rockies.
##
## Call:
## lm(formula = mean.state ~ is.climate.alli + Year, data = mean.state.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.67 -3.11 0.64 3.05 45.96
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1171.265 167.203 7.01 7.3e-12 ***
## is.climate.alliyes 0.955 0.526 1.81 0.07 .
## Year -0.564 0.083 -6.79 2.9e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.15 on 547 degrees of freedom
## Multiple R-squared: 0.0829, Adjusted R-squared: 0.0795
## F-statistic: 24.7 on 2 and 547 DF, p-value: 5.31e-11
##
## Call:
## lm(formula = delta.aqi.state ~ is.climate.alli + Year, data = mean.state.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.58 -1.26 0.06 1.36 13.30
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 81.8044 88.4278 0.93 0.36
## is.climate.alliyes -0.1220 0.2528 -0.48 0.63
## Year -0.0409 0.0439 -0.93 0.35
##
## Residual standard error: 2.82 on 497 degrees of freedom
## (50 observations deleted due to missingness)
## Multiple R-squared: 0.00221, Adjusted R-squared: -0.00181
## F-statistic: 0.55 on 2 and 497 DF, p-value: 0.577
Climate Alliance states tend to have a better AQI on average but it is not significant.
This might be because the Climate Alliance only went into effect 3 years ago in 2017. Note that climate Alliance states have better improvements on AQI on average.
Using the data found by the USDA’s Economic Research Service, we look for predictors in counties to determine air quality and find correlations. This begins by merging the 2019 AQI with the latest USDA ERS data. We use 2019 data to avoid skewing due to the 2020 West Coast fires.
To begin the analysis, we start by merging county data with AQI data. We start by merging all three sets of ERS county data, and then we merge by county and state.
We only take the data from year 2019 to keep it consistent. We are avoiding using 2020 data due to the fires on the West coast skewing data.
Break the cleaned and merged dataset into X and Y for use with cv.glmnet. We use set.seed(1) for consistency.
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(select_cols)` instead of `select_cols` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## Anova Table (Type II tests)
##
## Response: med.aqi
## Sum Sq Df F value Pr(>F)
## UnempRate2020 26 1 0.25 0.61643
## PctEmpChange1920 119 1 1.14 0.28520
## UnempRate2019 757 1 7.26 0.00719 **
## UnempRate2017 845 1 8.10 0.00451 **
## PctEmpAgriculture 60 1 0.58 0.44821
## PctEmpMining 0 1 0.00 0.98589
## PctEmpConstruction 227 1 2.17 0.14060
## PctEmpManufacturing 6 1 0.05 0.81835
## PctEmpTrans 1 1 0.01 0.93250
## UnempRate2012 15 1 0.15 0.70224
## UnempRate2009 336 1 3.22 0.07303 .
## PopChangeRate1819 8 1 0.08 0.77593
## NetMigrationRate1019 388 1 3.72 0.05400 .
## NaturalChangeRate1019 600 1 5.75 0.01671 *
## Net_International_Migration_Rate_2010_2019 128 1 1.23 0.26829
## NetMigrationRate0010 259 1 2.48 0.11541
## NaturalChangeRate0010 286 1 2.74 0.09808 .
## Immigration_Rate_2000_2010 25 1 0.24 0.62643
## BlackNonHispanicPct2010 387 1 3.71 0.05451 .
## AsianNonHispanicPct2010 1 1 0.01 0.90762
## NativeAmericanNonHispanicPct2010 194 1 1.86 0.17286
## MultipleRacePct2010 28 1 0.27 0.60379
## NonHispanicBlackPopChangeRate0010 592 1 5.67 0.01743 *
## NonHispanicAsianPopChangeRate0010 508 1 4.87 0.02758 *
## HispanicPopChangeRate0010 365 1 3.50 0.06169 .
## MultipleRacePopChangeRate0010 22 1 0.22 0.64263
## WhiteNonHispanicNum2010 503 1 4.82 0.02830 *
## MultipleRaceNum2010 207 1 1.98 0.15933
## ForeignBornEuropePct 85 1 0.81 0.36834
## ForeignBornMexPct 300 1 2.88 0.09028 .
## Ed1LessThanHSPct 268 1 2.57 0.10931
## Ed2HSDiplomaOnlyPct 619 1 5.93 0.01503 *
## Ed3SomeCollegePct 0 1 0.00 0.98653
## Ed4AssocDegreePct 1264 1 12.12 0.00052 ***
## FemaleHHPct 1200 1 11.50 0.00072 ***
## HH65PlusAlonePct 233 1 2.23 0.13535
## ForeignBornCaribPct 0 1 0.00 0.96018
## ForeignBornAfricaNum 253 1 2.42 0.11981
## ForeignBornMexNum 3091 1 29.63 6.6e-08 ***
## LandAreaSQMiles2010 7 1 0.07 0.79113
## Deep_Pov_All 115 1 1.10 0.29457
## Residuals 100680 965
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We remove the variables that are the least relevant to see what factors remain.
## [1] -2.17e+00 -1.93e+00 1.68e+00 -2.23e+00 3.59e+00 3.14e+00 -3.39e+00
## [8] 9.90e+00 8.89e-06 6.73e-05 2.04e-01 1.84e-01 1.68e-01 6.36e-02
## [15] 1.17e-01 1.60e-01 2.35e-05 1.64e-06 5.59e-03 4.29e-03 3.53e-03
## [22] 6.94e-02 4.57e-02 1.73e-01 4.87e-02 1.48e-01 6.97e-01 7.19e-01
## [29] 3.82e+00 5.72e-05
From the final model, we see that most of the impact on AQI is geographical. For example, the increase from ForeignBornMexNum and NetMigrationNum could signal that states closer to the Mexican border tend to have worse AQIs due to their location. However, the most clear predictors are the states themselves.
The assumptions for linearity appear to hold up until about 1 standard deviation below the mean.
## IncNodePurity
## UnempRate2020 1758
## PctEmpChange1920 2757
## UnempRate2019 1666
## UnempRate2017 1800
## PctEmpAgriculture 2646
## PctEmpMining 2029
## PctEmpConstruction 1628
## PctEmpManufacturing 1559
## PctEmpTrans 1878
## UnempRate2012 1669
## UnempRate2009 1723
## PopChangeRate1819 1379
## NetMigrationRate1019 1808
## NaturalChangeRate1019 1746
## Net_International_Migration_Rate_2010_2019 2202
## NetMigrationRate0010 1896
## NaturalChangeRate0010 2041
## Immigration_Rate_2000_2010 2332
## BlackNonHispanicPct2010 2349
## AsianNonHispanicPct2010 2058
## NativeAmericanNonHispanicPct2010 1699
## MultipleRacePct2010 1825
## NonHispanicBlackPopChangeRate0010 1911
## NonHispanicAsianPopChangeRate0010 2178
## HispanicPopChangeRate0010 1929
## MultipleRacePopChangeRate0010 1809
## WhiteNonHispanicNum2010 2958
## MultipleRaceNum2010 2800
## ForeignBornEuropePct 1575
## ForeignBornMexPct 2342
## Ed1LessThanHSPct 1837
## Ed2HSDiplomaOnlyPct 2553
## Ed3SomeCollegePct 1971
## Ed4AssocDegreePct 3102
## FemaleHHPct 1913
## HH65PlusAlonePct 2132
## ForeignBornCaribPct 1513
## ForeignBornAfricaNum 2123
## ForeignBornMexNum 2621
## LandAreaSQMiles2010 2009
## Deep_Pov_All 1913
## $names
## [1] "call" "type" "predicted" "mse"
## [5] "rsq" "oob.times" "importance" "importanceSD"
## [9] "localImportance" "proximity" "ntree" "mtry"
## [13] "forest" "coefs" "y" "test"
## [17] "inbag" "terms"
##
## $class
## [1] "randomForest.formula" "randomForest"
The overall objective of this study was to use the AQI of counties across the USA to determine the impact of variables on the climate. Using data collected by the EPA, we were able to focus on the effect of the Climate Alliance on curbing the deterioration of the AQI across the nation, as well as the correlation between aspects of counties and their air quality.
From this study, we were able to conclude that the Climate Alliance has not had much of an effect yet on the AQI of member states, but do have better AQIs on average compared to other states. We were also able to see that most of the impact on the AQI is geographical based on the significant variables of the model.